Machine Learning

This week, we introduce the core idea of teaching a computer to learn concepts using data—without being explicitly programmed.

We are going to start by covering linear regression with one variable. Linear regression predicts a real-valued output based on an input value. We discuss the application of linear regression to housing price prediction, present the notion of a cost function, and introduce the gradient descent method for learning.

Week 1

Linear Regresion with One Variable - Predicting a single output value from a single input value.

Hypothesis Function: $$h_\theta(x) = \theta_0 + \theta_1 x$$

We give to hθ values for θ0 and θ1 to get our output 'y'. In other words, we are trying to create a function called hθ that is able to reliably map our input data (the x's) to our output data (the y's).

Cost Function: $$J(\theta_0, \theta_1) = \dfrac {1}{2m} \displaystyle \sum _{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2$$

To break it apart, it is $\frac{1}{2} \bar{x}$ where $\bar{x}$ is the mean of the squares of $h_\theta (x^{(i)}) - y^{(i)}$, or the difference between the predicted value and the actual value.

Hypothesis function accuracy is measured by the Cost Function

Gradient Descent: Taking the derivative of our Cost Function to minimize its output

$\alpha$, the learning rate

Gradient descent equation:

repeat until convergence:

$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$ for j=0 and j=1

Or intuitively,

reapeat until convergence:

$\theta_j := \theta_j - \alpha [\text{Slope of tangent aka derivative}]$

Gradient Descent for Linear Regression

The point of all this is that if we start with a guess for our hypothesis and then repeatedly

apply these gradient descent equations, our hypothesis will become more and more accurate.

Classification & Regression

Model Representation:

Note: indicies in tables and matrices are 1-based, not 0

m = number of training examples

x's = "input" variable/features

y's = "output" variable/"target" variable

(x,y) : single training example (a row in a table)

$(x^{(i)},y^{(i)})$ : $i^{th}$ training example


In [2]:
from notebook_utils import *
IMG('univariate-linear-regression.png')


Out[2]:

Given training example (x,y), chose $\theta_0$ and $\theta_1$ so that $h\theta(x)$ is close to y from our training data


In [12]:
IMG('cost-function.png', r=True)


Out[12]:

Below: "Find the values of $\theta_0$ & $\theta_1$ so that the average $\frac12m$ * the $\Sigma$ of squared errors is minimized. (Khan Academy - Squared error of regression)


In [11]:
IMG('minimize.png')


Out[11]:

Cost Function: $J(\theta_0,\theta1)$

Derivatives


In [10]:
IMG('derivative.png')


Out[10]:

Gradient Descent + Mean Squared Error = our first Learning Algorithm (Linear Regression)


In [14]:
IMG('gradient-descent.png')


Out[14]:

In [17]:
IMG('gradient-descent-algorithm.png',r=True)


Out[17]:

Linear Regression - Simplified


In [18]:
IMG('linear-regression-simple.png')


Out[18]:

In [19]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

m = 4
n = 2
input = np.array([
    [1, 6],
    [2, 5],
    [3, 7],
    [4, 10]
])
X = np.matrix([np.ones(m), input[:,0]]).T
y = np.matrix(input[:,1]).T
betaHat = (X.T * X).I * X.T * y

print(betaHat)

plt.figure(1)
xx = np.linspace(0, 5, 2)
yy = np.array(betaHat[0] + betaHat[1] * xx)
plt.plot(xx, yy.T, color='b')

plt.scatter(input[:,0], input[:,1], color='r')
plt.show()


[[ 3.5]
 [ 1.4]]

Linear Algebra Review

Matrix: rectangular array of numbers

4 x 2 ( $\mathbb{R}^{4x2}$ ) and 2 x 3 ( $\mathbb{R}^{2x3}$ ) matrices below:

$\left[ \begin{array}{ccc} 1402 & 191 \\ 1371 & 821 \\ 949 & 1437 \\ 147 & 1448 \end{array} \right] \left[ \begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \end{array} \right] $

Vector: An n x 1 matrix ( $\mathbb{R}^4$ )

$ \left[ \begin{array}{c} 460 \\ 232 \\ 315 \\ 178 \end{array} \right] $

Note: assume vectors are 1-indexed

A,B,C,X <-usually matrices

a,b,c,x <-usually vector or scalar*

Matrix Addition

Can only add matrices of the same dimension, ie. $\mathbb{R}^{4x2} + \mathbb{R}^{4x2}$

Just add the values at each position

Scalar Multiplication & Division

$3 \times \left[ \begin{array}{cc} 1 & 0 \\ 2 & 5 \\ 3 & 1 \\ \end{array} \right] = \left[ \begin{array}{cc} 3 & 0 \\ 6 & 15 \\ 9 & 3 \\ \end{array} \right]$

Multiplying a matrix and a vector


In [23]:
IMG('matrix-vector-multiplication.png',r=True)


Out[23]:

Using Octave to compute Linear Regression compared to iterative: faster algo. & less code


In [22]:
IMG('octave-vs-iterative.png')


Out[22]:

Matrix * Matrix Multiplication


In [27]:
IMG('matrix-matrix-multiplication.png')


Out[27]:

Split second matrix into vectors, do matrix-vector multiplication, combine into new matrix

1x1 + 3x0 + 2x5 = 11

4x1 + 0x0 + 1x5 = 9


1x3 + 3x1 + 2x2 = 10

4x3 + 0x1 + 1x2 = 14

$\left[ \begin{array}{cc} 11 & 10 \\ 9 & 14 \\ \end{array} \right]$

Another Example of Matrix x Matrix


In [30]:
IMG('matrix-matrix-multi2.png')


Out[30]:

We can now use matrix multiplication to quickly (and efficiently) compute 3 possible hypothesis' for 4 different house sizes to get the 12 possible predictions!

We can also, of course, use libraries(eg. numpy) to compute these.

See image below:


In [32]:
IMG('matrix-matrix-multiplication-use.png')


Out[32]:

WATCH OUT:

Matrix multiplication is NOT commutative: $A \times B \ne B \times A$

Matrix multiplication IS associative.

Identity Matrix: 1s down the diagnol of an n x n matrix. Denoted: $I$ or $I_{n\times n}$ $\left[\begin{array}{ccc}1&0&0\\0&1&0\\0&0&1\end{array}\right]$


Inverses and Transpose

Only square matrices have inverses

If A is an m x m matrix, and if it has an inverse,

$A(A^{-1}) = A^{-1}A = I$

Example:


In [34]:
IMG('inverse-matrices.png')


Out[34]:

Matrix Transpose


In [35]:
IMG('matrix-transpose.png')


Out[35]:

Example problem Quiz2q4:

$u=\left[\begin{array}{c}3\\-5\\4\end{array}\right]$ and $v=\left[\begin{array}{c}1\\2\\5\end{array}\right]$

What is $u^T$ (u Transposed)?

Answer:

$u^T=\left[\begin{array}{ccc}3&-5&4\end{array}\right]$ so,

Given m x n X n x m == some matrix m x m,

1 x 3 matrix multiplied by a 3 x 1 will result in a 1 x 1 matrix


In [36]:
3*1 + -5*2 + 4*5


Out[36]:
13